Latent Semantic Indexing for Patent Documents
نویسندگان
چکیده
Since the huge database of patent documents is continuously increasing, the issue of classifying, updating and retrieving patent documents turned into an acute necessity. Therefore, we investigate the efficiency of applying Latent Semantic Indexing, an automatic indexing method of information retrieval, to some classes of patent documents from the United States Patent Classification System. We present some experiments that provide the optimal number of dimensions for the Latent Semantic Space and we compare the performance of Latent Semantic Indexing (LSI) to the Vector Space Model (VSM) technique applied to real life text documents, namely, patent documents. However, we do not strongly recommend the LSI as an improved alternative method to the VSM, since the results are not significantly better.
منابع مشابه
Leveraging Category-based LSI for Patent Retrieval
Latent Semantic Indexing (LSI) has been employed to reduce dimension of indices of documents for similarity search. In this paper, we will describe a method for retrieving conceptually similar patents first by categorizing patent collection and then by applying LSI algorithm multiple times to each category. The main strategy is keeping the algorithm as simple as possible, while achieving the sc...
متن کاملComparison of Information Retrieval Techniques: Latent Semantic Indexing and Concept Indexing
The task of information retrieval is to extract relevant documents for a certain query from the collection of documents. As large sets of documents are now increasingly common, there is a growing need for fast and efficient information retrieval algorithms. The algorithms we are dealing with are embedded in the vector space model. In this paper we compare two information retrieval techniques: l...
متن کاملUsing Random Indexing to improve Singular Value Decomposition for Latent Semantic Analysis
We present results from using Random Indexing for Latent Semantic Analysis to handle Singular Value Decomposition tractability issues. We compare Latent Semantic Analysis, Random Indexing and Latent Semantic Analysis on Random Indexing reduced matrices. In this study we use a corpus comprising 1003 documents from the MEDLINE-corpus. Our results show that Latent Semantic Analysis on Random Index...
متن کاملClassification and clustering methods for documents by probabilistic latent semantic indexing model
Based on information retrieval model especially probabilistic latent semantic indexing (PLSI) model, we discuss methods for classification and clustering of a set of documents. A method for classification is presented and is demonstrated its good performance by applying to a set of benchmark documents with free format (text only). Then the classification method is modified to a clustering metho...
متن کاملLatent Semantic Indexing (LSI) and TREC-2
Latent Semantic Indexing (LSI) is an extension of the vector retrieval method (e.g., Salton & McGill, 1983) in which the dependencies between terms are explicitly taken into account in the representation and exploited in retrieval. This is done by simultaneously modeling all the interrelationships among terms and documents. We assume that there is some underlying or "latent" structure in the pa...
متن کامل